15 research outputs found

    Privacy Preservation by Disassociation

    Full text link
    In this work, we focus on protection against identity disclosure in the publication of sparse multidimensional data. Existing multidimensional anonymization techniquesa) protect the privacy of users either by altering the set of quasi-identifiers of the original data (e.g., by generalization or suppression) or by adding noise (e.g., using differential privacy) and/or (b) assume a clear distinction between sensitive and non-sensitive information and sever the possible linkage. In many real world applications the above techniques are not applicable. For instance, consider web search query logs. Suppressing or generalizing anonymization methods would remove the most valuable information in the dataset: the original query terms. Additionally, web search query logs contain millions of query terms which cannot be categorized as sensitive or non-sensitive since a term may be sensitive for a user and non-sensitive for another. Motivated by this observation, we propose an anonymization technique termed disassociation that preserves the original terms but hides the fact that two or more different terms appear in the same record. We protect the users' privacy by disassociating record terms that participate in identifying combinations. This way the adversary cannot associate with high probability a record with a rare combination of terms. To the best of our knowledge, our proposal is the first to employ such a technique to provide protection against identity disclosure. We propose an anonymization algorithm based on our approach and evaluate its performance on real and synthetic datasets, comparing it against other state-of-the-art methods based on generalization and differential privacy.Comment: VLDB201

    In support of workload-aware streaming state management

    Full text link
    Modern distributed stream processors predominantly rely on LSM-based key-value stores to manage the state of long-running computations. We question the suitability of such general-purpose stores for streaming workloads and argue that they incur unnecessary overheads in exchange for state management capabilities. Since streaming operators are instantiated once and are long-running, state types, sizes, and access patterns, can either be inferred at compile time or learned during execution. This paper surfaces the limitations of established practices for streaming state management and advocates for configurable streaming backends, tailored to the state requirements of each operator. Using workload-aware state management, we achieve an order of magnitude improvement in p99 latency and 2x higher throughput.https://www.usenix.org/system/files/hotstorage20_paper_kalavri.pdfPublished versio

    TVA: A multi-party computation system for secure and expressive time series analytics

    Get PDF
    We present TVA, a multi-party computation (MPC) system for secure analytics on secret-shared time series data. TVA achieves strong security guarantees in the semi-honest and malicious settings, and high expressivity by enabling complex analytics on inputs with unordered and irregular timestamps. TVA is the first system to support arbitrary composition of oblivious window operators, keyed aggregations, and multiple filter predicates, while keeping all data attributes private, including record timestamps and user-defined values in query predicates. At the core of the TVA system lie novel protocols for secure window assignment: (i) a tumbling window protocol that groups records into fixed-length time buckets and (ii) two session window protocols that identify periods of activity followed by periods of inactivity. We also contribute a new protocol for secure division with a public divisor, which may be of independent interest. We evaluate TVA on real LAN and WAN environments and show that it can efficiently compute complex window-based analytics on inputs of 2222^{22} records with modest use of resources. When compared to the state-of-the-art, TVA achieves up to 5.8×5.8\times lower latency in queries with multiple filters and two orders of magnitude better performance in window aggregation

    Web Data Management with Applications in Privacy

    No full text
    Privacy preservation in data publishing has gained considerable attention during the last years due to the need of several organizations to share their data without revealing sensitive information about real persons or legal entities included in them. Representative examples of such datasets are the so-called customers' databases kept by enterprises, the data produced by clinical tests and experiments conducted in hospitals and related institutes, the query logs held by search engine providers like Google, the financial data from the public sector information systems, the social data produced by the participation of individuals in social networks like Facebook, the location data from telecom providers like Vodafone, and so forth. On the one hand, the records included in these datasets carry valuable information not only for their owners, but also for a plethora of enterprises, universities and institutions. The evaluation of the proposed information retrieval and data mining techniques through realistic experiments, the ability to perform large-scale market analysis, and the feasibility of conducting medical, social, and multidisciplinary studies based on these datasets are only a few examples that demonstrate the importance for the latter to be publicly available. On the other hand, taking into account that the actual data are produced by the activities of “real” people, which in the usual case are performed in private, they are likely to capture sensitive information about them (e.g. medical and financial information, political beliefs, sexual preferences, etc.) that can be disclosed even if they are published without the attributes that directly identify an entity; for instance, the person's name, the social ID number, or the IP of a web server. Even without these attributes, the disclosure of sensitive information about an individual, or an entity in general, is achieved when: (i) the “adversary” has a priori some form of background knowledge about the entity (e.g., knows a person's music preferences), and/or (ii) the published dataset is cross-checked with other publicly available data sources (e.g., demographic data, social profiles extracted from personal web pages, etc.).Η διαφύλαξη της ιδιωτικότητας κατά τη δημοσίευση δεδομένων έχει αποκτήσει ιδιαίτερο ενδιαφέρον τα τελευταία χρόνια, λόγω της ανάγκης των διαφόρων φορέων να μοιράζονται δεδομένα που περιέχουν ευαίσθητες πληροφορίες για φυσικά ή νομικά πρόσωπα. Αντιπροσωπευτικά παραδείγματα τέτοιων δεδομένων είναι οι βάσεις πελατών που τηρούνται από επιχειρήσεις, τα δεδομένα από τις κλινικές δοκιμές που διεξάγονται σε νοσοκομεία, τα αρχεία καταγραφής ερωτημάτων από μηχανές αναζήτησης στον Ιστό όπως αυτή της Google, τα οικονομικά στοιχεία των πολιτών από το Δημόσιο Τομέα, τα δεδομένα που παράγονται από τη συμμετοχή των χρηστών σε κοινωνικά δίκτυα όπως το Facebook, τα δεδομένα από τηλεπικοινωνιακούς παρόχους όπως η Vodafone, κλπ. Τα στοιχεία που περιλαμβάνονται σε αυτά τα δεδομένα είναι πολύτιμα όχι μόνο για τους ιδιοκτήτες των δεδομένων, αλλά και για πληθώρα επιχειρήσεων, πανεπιστημίων και οργανισμών. Η δωρεάν διάθεση πραγματικών δεδομένων στο κοινό είναι απαραίτητη προϋποθεση για τη χρήση τους σε μεγάλης κλίμακας αναλύσεις αγοράς, σε ιατρικές και κοινωνολογικές μελέτες, σε πειραματικές αξιολογήσεις αλγορίθμων από την έρευνα, κλπ. Ωστόσο, τα πραγματικά δεδομένα παράγονται από τις δραστηριότητες ανθρώπων που συνήθως γίνονται υπό καθεστώς ανωνυμίας και, κατά συνέπεια, είναι πολύ πιθανό να περιέχουν ευαίσθητες πληροφορίες σχετικά με φυσικά πρόσωπα (π.χ. ιατρικά και οικονομικά στοιχεία, πολιτικές πεποιθήσεις, σεξουαλικές προτιμήσεις, κ.λπ.) οι οποίες μπορεί να διαρρεύσουν ακόμη και αν τα δεδομένα δημοσιοποιηθούν χωρίς τα χαρακτηριστικά που προσδιορίζουν άμεσα ένα άτομο (π.χ. το όνομα του ατόμου, ο ΑΦΜ, ή η IP ενός web server. Ακόμη και χωρίς αυτά τα χαρακτηριστικά, η αποκάλυψη ευαίσθητων πληροφοριών για ένα άτομο επιτυγχάνεται όταν: (i) ο ‘αντίπαλος’ έχει εκ των προτέρων κάποια γνώση σχετικά με ένα άτομο (π.χ., γνωρίζει τις μουσικές προτιμήσεις του), και/ή (ii) τα δημοσιευμένα δεδομένα διασταυρώνονται με άλλες δημόσιες πηγές δεδομένων (π.χ. δημογραφικά στοιχεία, κοινωνικά προφίλ που εξάγονται από προσωπικές ιστοσελίδες, κλπ)

    Secrecy: Secure collaborative analytics on secret-shared data

    Get PDF
    We study the problem of composing and optimizing relational query plans under secure multi-party computation (MPC). MPC enables mutually distrusting parties to jointly compute arbitrary functions over private data, while preserving data privacy from each other and from external entities. In this paper, we propose a relational MPC framework based on replicated secret sharing. We define a set of oblivious operators, explain the secure primitives they rely on, and provide an analysis of their costs in terms of operations and inter-party communication. We show how these operators can be composed to form end-to-end oblivious queries, and we introduce logical and physical optimizations that dramatically reduce the space and communication requirements during query execution, in some cases from quadratic to linear with respect to the cardinality of the input. We provide an efficient implementation of our framework, called Secrecy, and evaluate it using real queries from several MPC application areas. Our results demonstrate that the optimizations we propose can result in up to 1000× lower execution times compared to baseline approaches, enabling Secrecy to outperform state-of-the-art frameworks and compute MPC queries on millions of input rows with a single thread per party
    corecore